nesterov acceleration
Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
We introduce the ``continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise. Finally, using our continuized framework and expressing the gossip averaging problem as the stochastic minimization of a certain energy function, we provide the first rigorous acceleration of asynchronous gossip algorithms.
Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems
In this work, we use dynamical mean field theory techniques to describe analytically the average dynamics of these methods in a prototypical non-convex model: the (spiked) matrix-tensor model. We derive a closed set of equations that describe the behaviour of heavy-ball momentum and Nesterov acceleration in the infinite dimensional limit.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
On the Effectiveness of the z-Transform Method in Quadratic Optimization
Characterizing the convergence of real-valued or vector-v alued sequences is a key theoretical problem in data science, where the sequence index typically correspon ds to the number of iterations of an iterative algorithm (such as in optimization and signal processing) o r the number of observations (as in statistics and machine learning). This characterization can be done in mostly two ways, asymptotically or non-asymptotically. In an asymptotic analysis, an asymptotic e quivalent of the sequence is identified, which readily allows comparisons with other algorithms; however, without further analysis, the behavior at any finite time cannot be controlled. This is exactly what non-as ymptotic analysis aims to achieve, by providing bounds that are valid even for a finite index, but then only pro viding bounds that cannot always be compared. While the two approaches have their own merits, in this paper, we focus on asymptotic analysis and sequences that tend to their limit at a sub-exponential r ate that is a power of the sequence index. The main goal of this paper is to show how a classical tool from signal processing, control theory, and electrical engineering ( Oppenheim et al., 1996), the z -transform method ( Jury, 1964), can be used in this context with a striking efficiency at obtaining asymptotic eq uivalents for the class of algorithms that can be seen as iterations of potentially random linear operators i n a Hilbert space. This includes gradient descent for quadratic optimization problems as well as its accelera ted and stochastic variants ( Nesterov, 2018), 1 Landweber iterations in inverse problems ( Benning and Burger, 2018), or gossip algorithms in distributed computing ( Boyd et al., 2006).
- Europe > France (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > West Sussex (0.04)
- Asia > Middle East > Jordan (0.04)
Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.
Nesterov acceleration despite very noisy gradients
We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex and strongly convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient at every point. Nesterov's method converges at an accelerated rate if the constant of proportionality is below 1, while AGNES accommodates any signal-to-noise ratio. The noise model is motivated by applications in overparametrized machine learning. AGNES requires only two parameters in convex and three in strongly convex minimization tasks, improving on existing methods.
Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms
We introduce the continuized'' Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter. The two variables continuously mix following a linear ordinary differential equation and take gradient steps at random times. This continuized variant benefits from the best of the continuous and the discrete frameworks: as a continuous process, one can use differential calculus to analyze convergence and obtain analytical expressions for the parameters; but a discretization of the continuized process can be computed exactly with convergence rates similar to those of Nesterov original acceleration. We show that the discretization has the same structure as Nesterov acceleration, but with random parameters. We provide continuized Nesterov acceleration under deterministic as well as stochastic gradients, with either additive or multiplicative noise.
Nesterov Acceleration for Ensemble Kalman Inversion and Variants
Vernon, Sydney, Bach, Eviatar, Dunbar, Oliver R. A.
Ensemble Kalman inversion (EKI) is a derivative-free, particle-based optimization method for solving inverse problems. It can be shown that EKI approximates a gradient flow, which allows the application of methods for accelerating gradient descent. Here, we show that Nesterov acceleration is effective in speeding up the reduction of the EKI cost function on a variety of inverse problems. We also implement Nesterov acceleration for two EKI variants, unscented Kalman inversion and ensemble transform Kalman inversion. Our specific implementation takes the form of a particle-level nudge that is demonstrably simple to couple in a black-box fashion with any existing EKI variant algorithms, comes with no additional computational expense, and with no additional tuning hyperparameters. This work shows a pathway for future research to translate advances in gradient-based optimization into advances in gradient-free Kalman optimization.
- Europe > United Kingdom > England > Berkshire > Reading (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)